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Abstract- This paper introduces ConAno, an NLP-assisted 
text annotation tool, that is designed to assist the task of 
sentiment analysis and aspect-based sentiment analysis 
datasets preparation. CanAno_ offers exceptional 
configurability, empowers users to easily customize, it to a 
range of sentiment analysis tasks. Moreover, ConAno 
enables the annotation of opinion terms in the context of 
specific aspect-related sentiments. Furthermore, ConAno 
employs a systematic annotation process that progressively 
categorizes aspects with increasing complexity. These 
categories encompass dimensions, including aspects and 
their associated sentiments. Additionally, our tool facilitates 
the meticulous tagging of multiple opinion terms, linked to 
each specific aspect. With a user friendly graphical 
interface, annotation process in ConAno relies solely on 
user-friendly and easy mouse interactions, thus eliminating 
the need for conventional keyboard inputs and commands. 
Furthermore, ConAno is freely available, open source 
software that accelerates the annotation process by 
reducing the annotation time typically by 20 -25%. 
Ultimately, ConAno due to its adaptability, user- 
friendliness, and efficiency contribute to its capacity to 
enhance precision and efficacy in sentiment analysis tasks. 

Keywords— text annotation, configurable annotation tool, 
annotation tool, aspect annotation, sentiment annotation, aspect 
based sentiment annotation. 


I. INTRODUCTION 


Text annotation plays pivotal role in the research area of 
natural language processing (NLP) and machine learning, by 
converting unstructured textual data into structured 
informative datasets [1]. The process involves the strategic 
labeling of text segments with meaningful tags, labels, or 
annotations, thus empowering both human analysts and 
automated algorithms to extract valuable insights, elaborate 
semantic complexities, and derive actionable intelligence 
from the voluminous pool of textual information [2], [3]. In 
recent years, the rapid increase of textual data across diverse 
domains has underscored the significance of efficient and 
accurate text annotation methodologies [4]. 

Text annotations serve as the foundational bedrock for both 
the evaluation and training of cutting-edge NLP techniques 
[5]. However, annotation remains a critical step for the 
refinement of NLP techniques, as it simultaneously emerges 
as a labor-intensive task that demands considerable resources 
[2], [3], [6]. This reality is particularly evident in the case of 
intricate tasks such as sentiment analysis and aspect-based 
sentiment analysis, where the nuanced interplay between 
language and emotion demands precise annotations. 
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The demand for high-quality annotations increase with the 
continually evolving research in NLP [5]. Therefore 
development of tools and methodologies aimed at expediting 
the annotation process. However, these tools often show a 
trade-off between enhancing efficiency and maintaining the 
essential features of annotation quality and consistency. As, 
maintaining a balance is crucial to ensure the utility and 
relevance of the resulting datasets for training and evaluating 
state-of-the-art NLP models. 

ConAno addresses the critical juncture of limitation and 
challenges by introducing an innovative configurable and 
NLP-assisted text annotation tool. ConAno is designed to 
alleviate the substantial time and resource burdens associated 
with annotation tasks while maintaining a_ steadfast 
commitment to precision and coherence in annotations. 
ConAno is initially tailored and tested specifically for two 
tasks, including sentiment analysis task and aspect based 
sentiment analysis. Sentiment analysis is a tasks, where 
opinions and emotions complicatedly fuse with linguistic 
expressions. Whereas, aspect-based sentiment analysis is a 
task, that examines even deeper granular aspects of sentiment 
identification. 

This paper presents the creation, architecture, and 
performance of our NLP-assisted text annotation tool. We 
elaborate its configurability, user interface, and the tangible 
efficiency gains observed in real-world annotation scenarios. 
Through empirical evaluation, we establish the efficacy of 
our tool, ConAno in mitigating the resource-intensive nature 
of annotation and enhancing the overall quality and 
consistency of the generated annotations. By addressing the 
dual challenges of efficiency and accuracy in annotation, our 
research stands to accelerate the advancements in NLP 
research and application development. 


Il. RELATED WORK 


Numerous tools are available for the purpose of text 
annotation, encompassing both web-based and desktop 
applications. Nonetheless, certain notable shortcomings have 
been identified within these existing tools. Among these tools 
is "doccano,"[7] a web-based annotation tool; however, it is 
constrained by particular limitations. The labels it offers to 
users may not align well with our specific domain, thereby 
impinging upon its adaptability. Moreover, this tool lacks the 
capacity to conduct sentiment analysis beyond the binary 
division of positive and negative sentiments, which could be 
limiting for projects requiring more nuanced sentiment 
classification. Of equal significance, doccano does not 
encompass provisions for aspect-based sentiment analysis an 
integral functionality in scenarios necessitating an intricate 
exploration of sentiments corresponding to various aspects. 
Similarly, the precision of opinion term extraction at a fine- 
grained level remains unaddressed by doccano. These 


limitations collectively underscore the need for a more 
customizable, versatile, and feature-rich text annotation tool 
tailored to the distinct requisites of specific projects and 
domains. 

"INCEpTION" [8] another tool, while robust in its 
capabilities, does present a higher degree of complexity when 
compared to simpler tasks. Notably, INCEpTION places a 
significant emphasis on tasks involving named entity 
recognition and concept linking, which are inherently more 
intricate in nature. Regrettably, this complexity translates to 
a less intuitive user experience and poses challenges for users 
unfamiliar with the tool. Its multifaceted nature necessitates 
a more comprehensive understanding, making it less user- 
friendly for individuals seeking a tool that offers a more 
straightforward and streamlined approach. Furthermore, 
INCEpTION lacks built-in text preprocessing capabilities, 
which can be a limitation for users seeking a more integrated 
solution. Moreover, the configuration process of 
INCEpTION is notably challenging, particularly for 
individuals without a technical background. This complexity 
can hinder the tool's accessibility and utility for non-technical 
users who require a more user-friendly and straightforward 
annotation experience. 

Among these tools is MonkeyLearn [9], which stands out for 
its user-friendly platform, offering both pre-built models and 
customization options tailored to different industries. 
Lexalytics [10], another prominent contender, specializes in 
handling large volumes of text data, delivering insights into 
specific aspects as part of its sentiment analysis solutions. 
Brandwatch [11] emerges as a significant player in the field, 
offering a comprehensive social listening and analytics 
platform enriched with aspect-based sentiment analysis 
capabilities. This platform aids businesses in monitoring 
brand perceptions and comprehending sentiment nuances 
across various dimensions. RapidMiner [12], a powerful data 
science platform, provides users with visual tools for text 
analytics, including aspect-based sentiment analysis, 
facilitating model creation and deployment. 

Gavagai [13] delves into language analysis and excels in 
understanding customer feedback and opinions across 
different aspects, contributing to a deeper understanding of 
sentiment dynamics. Clarabridge [14], on the other hand, 
offers a comprehensive text analytics platform that 
incorporates aspect-based sentiment analysis. It is geared 
towards extracting insights from customer interactions, 
guiding businesses to refine their strategies based on these 
sentiments. 

MeaningCloud [15], in addition to its API, furnishes a 
commercial platform for text analytics, including aspect- 
based sentiment analysis. This platform offers tools that 
facilitate understanding customer opinions, shaping 
actionable insights. Luminoso [16] focus on text analytics, 
coupled with its aspect-based sentiment analysis capabilities, 
underscores its commitment to deriving insights from 
unstructured text data. 

Opinify[17] addresses the unique requirements of e- 
commerce and retail industries, honing in on aspect-based 
sentiment analysis to decode product feedback and customer 
sentiments. Lastly, Aspectiva [18], now under Amazon's 
umbrella, employs AlI-powered solutions to enhance the 
shopping experience through the analysis of product reviews, 


demonstrating a strong alignment with enhancing customer 
satisfaction. 

Collectively, these tools offer a spectrum of customization 
levels, integration possibilities, and  industry-specific 
solutions. As researchers and practitioners navigate this 
landscape, evaluating tools based on factors such as features, 
pricing, ease of use, and alignment with specific needs 
becomes pivotal for making informed decisions in the realm 
of aspect-based sentiment analysis. 


A. Deficiencies in Exsisting Annotation Tools: 


1) Limited Customization 
Many existing text annotation tools offer predefined labels 
and categories, which might not align with the specific 
domain or context of the user's data. This lack of 
customization can hinder accurate and relevant annotation. 

2) Complexity 
Some tools, like INCEpTION, can be overly complex, 
making them challenging to understand and use, especially 
for users without technical expertise. This complexity may 
discourage potential users from effectively utilizing the tool. 

3) Lack of Intuitiveness 
A common issue is the lack of intuitive user interfaces in 
certain annotation tools. A user-friendly interface is crucial 
for efficient annotation, especially for those who are new to 
annotation tasks. 

4) Limited Sentiment Analysis 
Many tools might offer sentiment analysis, but with 
limitations. For instance, they might only support binary 
sentiment classification (positive/negative), lacking the 
capability to discern multiple sentiment levels or emotions. 

5) Aspect-based Sentiment Analysis 
Some tools might not support aspect-based sentiment 
analysis, which is crucial for tasks like identifying sentiments 
associated with specific aspects or features within a text. 

6) Opinion Term Extraction 
The fine-grained extraction of opinion terms and expressions 
might be lacking in some tools. This can be crucial for 
understanding nuanced opinions within texts. 

7) Text Preprocessing 
The absence of built-in text preprocessing capabilities in 
some tools can be a disadvantage. Preprocessing, such as 
removing HTML tags, URLs, and other noise, is essential to 
improve data quality before annotation. 

8) Configurability 
Difficulty in configuring the tool for specific use cases, 
especially for non-technical users, can limit its adoption. 
Tools that require extensive configuration can be time- 
consuming and frustrating for individuals with limited 
technical skills. 

9) Generic Labels 
Some annotation tools might provide sentiment labels or 
aspect categories that are not aligned with your domain or 
project's specific needs. Customization can be challenging in 
such cases. 

10) Learning Curve 
The steep learning curve associated with certain tools can 
discourage users from investing time and effort in mastering 
the tool, leading to suboptimal results and adoption 
challenges. 


11) Domain-Specific Nuances 
Off-the-shelf tools might not be designed to handle the 
specific nuances of intended domain. They could miss out on 
capturing important details that a custom tool can address. 


Il. FEATIURES 


A, Domain-Specific Expertise and Focus: 


Our system stands apart by catering exclusively to the 
specialized domain of sentiment analysis and _ text 
categorization. This concentrated focus ensures that the tool's 
functionalities and features are aligned with the nuances and 
intricacies of sentiment-related tasks. 


B. Multi-Label Multi-Class Flexibility 


In contrast to many existing tools that often impose a binary 
or single-label approach, our system empowers users to 
assign multiple labels or classes to each text instance. This 
flexibility acknowledges the complexity of sentiments and 
enables a more accurate representation of text content. 


C. Aspect-Based Sentiment Analysis Precision 


One of the hallmarks of our system is its capability to perform 
aspect-based sentiment analysis. This advanced feature 
allows sentiments to be linked to specific aspects or attributes 
within a text, enabling users to dissect opinions in a highly 
granular manner. 


D. Fine-Grained Opinion Term Extraction 


Our system's prowess lies in its fine-grained opinion term 
extraction. This functionality goes beyond general sentiment 
identification and pinpoints precise terms and expressions 
that convey opinions. This depth of analysis provides richer 
insights into the sentiments expressed. 


E. Wide Spectrum of Sentiment Classes 


A standout attribute is our system's comprehensive sentiment 
classification, encompassing a broad spectrum of emotions: 
"Positive," "Neutral," "Negative," "Very Negative," and 
"Very Positive." This multidimensional approach captures 
the intricate nuances of sentiments expressed. 


F. Tailored Annotation Configuration 


The configurability of our system is a significant asset. Users 
can tailor annotation parameters to mirror the requirements of 
their unique tasks. This customization encompasses defining 
bespoke labels, categories, and sentiment classes, ensuring 
that the tool adapts seamlessly to diverse use cases. 


G. User-Friendly and Intuitive Interface 


The user interface of our system is meticulously designed 
with ease of use in mind. Its intuitive layout and navigation 
streamline the annotation process. The interface fosters rapid 
familiarization, enabling both technical and non-technical 
users to leverage the tool effectively. 


H. Customizable Aspect Categories 


The aspect-based sentiment analysis module provides a 
canvas for users to define and customize categories and 
aspects according to their research objectives. This bespoke 
configuration empowers users to tailor analysis precisely to 
their areas of interest. 


I. Built-In Text Preprocessing 


our system goes beyond annotation by integrating text 
preprocessing capabilities. Functions like HTML tag 
removal, URL and email elimination, and punctuation 
cleaning enhance data quality before annotation even 
commences. 


J. Simplified Configuration for All Users 


Your system's architecture emphasizes simplicity in 
configuration. It caters to a diverse user base, including those 
without technical expertise. This inclusive approach removes 
barriers and empowers a wider range of users to harness the 
system's capabilities. 

By amalgamating these in-depth features, our system 
becomes an invaluable resource for professionals engaged in 
sentiment analysis and text annotation. Its tailored 
functionalities offer a holistic solution that aligns with the 
unique demands of sentiment-related tasks, marking it as a 
game-changer in the field. 


IV. IMPLEMENTATION 


The implementation of our research project encompasses the 
development of a Configurable Annotation Tool, proficiently 
crafted using the Java Swing framework. This tool is 
engineered to be universally compatible, capable of running 
on any machine equipped with both Java Development Kit 
(JDK) and Java Runtime Environment (JRE). The following 
sections provide a comprehensive breakdown of the tool's 
architecture, components, and functionalities. 


A. Tool Architecture 


Our Configurable Annotation Tool is designed to streamline 
the process of annotating text data for sentiment analysis 
tasks. Leveraging Java Swing, a user-friendly graphical user 
interface (GUI) is established, enabling efficient interaction 
with the tool. The tool follows a modular structure, with the 
core components outlined below: 


B. User Interface: 


The code creates a graphical user interface (GUI) using Java 
Swing components to provide an intuitive platform for 
annotators to interact with the annotation tool. The GUI 
includes several components, such as labels, text areas, 
buttons, and checkboxes, which are organized within a layout 
to make the interface user-friendly. A few of GUI features are 
shown in Fig.1, Fig.2, and Fig. 3. 


SIMPLE TEXT CATEGORIZATION OR SENTIMENT ANAYLSIS 
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Figure 1. GUI for review identification ‘in ConAno. 


SIMPLE TEXT CATEGORIZATION OR SENTIMENT ANAYLSIS WITH OPINION TERM 


Figure 2. GUI for opinion term extraction in ConAno 


ASPECT BASED CLASSIFICATION (ABSA) WITH OPINION TERM. 


Figure 3. GUI for ConAno displaying error message missing information 


C. Data Configuration: 


Before presenting the interface, the code performs data 
configuration. It loads configuration settings and state 
information from JSON files (config.json and state.json). 

1) Configuration File for the Annotation Tool: 
A JSON file serves as the configuration file for the 
annotation tool as in Fig 4. It specifies following parameters 
and settings required for annotating text data: 

e "domain": Denotes the subject domain for the 
configuration, which could represent the type of data 
being analyzed, such as "product reviews." 

e "input_data_ path": Specifies the path to the input 
data file (CSV format) that requires annotation. 

e "Sentiment_Analysis or Text _Annotataion_under 
_classes": This section defines whether simple 
annotation or sentiment analysis under classes is to 
be performed. If "Simple Annotation" is set to true, 
annotation will be done without aspects. The 
"Classes(Sentiments_or_aspects)" array lists the 
classes (sentiments or aspects) that annotators will 
work with. 

e "Aspect_based_sentiment_Analysis": 
settings for aspect-based sentiment analysis. 
"Aspect_based_Sentiment_Analysis" determines 
whether this mode is active. The 
"Categories wrapper" contains settings for aspects 
organized within categories. If 
"wrap_aspects_in categories" is set to true, aspects 
are organized under categories specified in the 
"categories" array, and their details are listed under 


Contains 


"aspects". 
e §=6"Classes(Sentiments_or_aspects)" and 
"sentiments": These arrays define the classes 


(sentiments or aspects) and sentiment labels that 
annotators will assign. Modify these based on the 
specific categories/aspects and sentiments relevant 
to your analysis. 

e "opinion term": This Boolean value determines 
whether opinion terms will be collected along with 
aspect and sentiment annotations. If set to "true," 
clicking on a checkbox opens a secondary window 
for annotating opinion terms. 


e "split_Reviews_to_sentences": If set to true, the tool 
will automatically split review texts into sentences 
before annotation. 

e = "text_preprocessing": Configures various text 
preprocessing options to be applied before 
annotation. This includes removing HTML tags, 
URLs, emails, mentions, punctuations, and specific 
stopwords, and handling extra spaces. 

With this comprehensive configuration file, user tailor the 
annotation tool to the specific requirements of your analysis, 
whether you're performing simple annotation, sentiment 
analysis under classes, or aspect-based sentiment analysis 
with categorized aspects and sentiments. 


V. USAGE 


Our Configurable Annotation Tool is primed to transform the 
annotation landscape by delivering enhanced annotation 
efficiency, consistency, and scalability. Notably, it offers: 


A. Intuitive Interface 


The user-friendly GUI fosters a comfortable and productive 
annotation experience, enabling annotators to focus on 
content without the burden of complex interactions. 


B. Efficiency and Consistency 


The tool minimizes the time and financial investments 
required for annotation efforts. By automating certain tasks 
and providing structured annotation options, it promotes 
consistent and accurate annotations. 


": “example domain", 
ta_path": “path/to/input/data.csv", 
ext_Annotataion_under_classes": { 

"Classes (Sentiments_or_aspects)": ["Class A", "Class B", "Class C") 

"Aspect_based_sentiment Analysis": { 

"Aspect_based_ Sentiment Analysis": false, 

"Categories _wrapper": { 
"wrap_aspects_in_ categories” 
"categories": ["Category 
"aspects": { 

"Category A": ["Aspect 1", “Aspect 2", "Aspect 3", "Aspect 4"], 
"Category B": ["Aspect X", "Aspect Y", "Aspect 2"], 
"Category C": ["Aspect I", "Aspect II", "Aspect III"] 


", "Category C"], 


"Classes (Sentiments_or aspects)": ["Class A", "Class B", "Class C"], 
"sentiments": ["Positive", "Neutral", "Negative", "Very Negative", "Very Positive"] 


": false, 
> Sentences": false, 


"custom_stopwords_to_remove": ["the", "and", "of"], 
"yemove extra spaces": true 


Figure 4. Configuration file details for customizing ConAno. 


C. Scalability 

The tool's modular design and streamlined interface make it 
adaptable for varying dataset sizes and research requirements. 
D. Gold Standard Corpus Development 


The tool is built to assist in developing gold standard 
annotated corpora, which serve as a benchmark for training 
and evaluating NLP models. 


E. Data Privacy 


The tool adheres to strict data privacy guidelines, ensuring 
that no personal information is collected from users during 
the annotation process. 


VI. USE CASE ANNOTATION WITH RESTURENT REVIEWS 


To demonstrate the practical effectiveness of our annotation 
tool, we undertook a comprehensive annotation endeavor 
involving a substantial dataset comprising 10,000 reviews. 
This annotation task was focused on aspect-based sentiment 
analysis, involving both explicit and implicit aspect 
extraction. Throughout the annotation process, our tool 
played a pivotal role in enhancing the efficiency and efficacy 
of the entire workflow. 

The tool's contribution was particularly evident in several key 
aspects: 


A, Learning Curve and Intuitiveness: 


ConAno has a user-friendly interface and intuitive design 
significantly reduced the learning curve for annotators. The 
straightforward navigation and clear labeling of features 
ensured that annotators, regardless of their technical 
expertise, quickly grasped the tool's functionalities. 

B. Configuration Simplicity 

ConAno configurability is a  distingushing feature. 
Configuring the tool for our specific annotation task is a 
seamless process. On average, annotators could tailor the all 


settings and parameters without the need for extensive 
technical Knowledge. 


C. Efficient Annotation 


A remarkable advantage of our tool is, swift annotation 
process. An individual review could be annotated in less than 
5 seconds on average. This efficiency is a result of the 
responsive and well-designed annotation workflow. 


D. Performance and Seamlessness 


Performance of ConAno is notably seamless. ConAno 
maintained its performance even when dealing with a 
substantial dataset. This reliability ensured that the 
annotation process remains uninterrupted and efficient 
throughout. 


E. Quality Control 


Despite of high speed, ConAno maintains a high standard of 

annotation quality. The user-friendly interface did not 
compromise the precision and accuracy of the annotations. 
The tool facilitated meticulous labeling and maintained a 
consistent level of annotation excellence. 


F. Time Efficiency 


The tool's efficiency was not only attributed to its 
annotation speed but also to the overall time saved in the 
configuration, navigation, and execution of the annotation 
tasks. This time-saving aspect added to the tool's appeal and 
practicality. 

Our annotation tool demonstrated its capability to handle 
large-scale aspect-based sentiment analysis annotation tasks 
with remarkable efficiency and precision. Its contribution to 
the workflow was evident in streamlining the process, 
enhancing learning, and maintaining high annotation 
quality. The tool's usability, configuration flexibility, and 
seamless performance established it as an invaluable asset 
for enhancing sentiment analysis endeavors. 


VII. CONCLUSION 


We developed CanAno a specialized text annotation and 
sentiment analysis tool which is a significant advancement in 
the field of sentiment analysis and text categorization. This 
tool has been meticulously crafted to address the limitations 
of existing solutions and cater specifically to the nuanced 
requirements of sentiment-related tasks. By offering multi- 
label, multi-class annotation capabilities, aspect-based 
sentiment analysis, and fine-grained opinion term extraction, 
our system provides users with unparalleled insights into the 
sentiments expressed in textual data. The user-friendly 
interface, coupled with customizable configurations and 
built-in text preprocessing, ensures that both technical and 
non-technical users can harness the power of this tool to 
enhance their research and analysis endeavors. 

ConAno tool is already strong and innovative, but there's a 
lot more we can do with it. We can use smarter computer 
programs to automatically understand feelings better and sort 
things more accurately. We could also add features that let 
people work together on projects in real-time and make the 
tool work for different topics. We might also make it better at 
finding subtle feelings in text and let users change how the 
results look. By using special learning methods, we could do 
more with less information. We could even use other helpful 
resources to make the tool even better at figuring out feelings. 
We're also planning to create guides and help for users. This 
tool's journey is just beginning, and we're excited to make it 
even more useful in the future, changing how we find feelings 
in text. 

ConAno is a strong and innovative solution, however, there 
is significant potential for further enhancements. For 
instance, integrating advanced machine learning algorithms 
could automate sentiment analysis and enhance the accuracy 
of categorization and aspect extraction. Incorporating 
features for real-time collaboration could foster smoother 
teamwork on annotation projects. Additionally, extending the 
tool's adaptability to diverse domains would enhance its 
versatility and broaden its appeal. Further improvements in 
sentiment analysis could involve more refined techniques to 
capture subtle opinions and sentiments. Customizable 
visualization options would aid in the effective interpretation 
and presentation of analysis outcomes. Exploring semi- 
supervised learning techniques might allow for a more 
comprehensive analysis using limited annotated data. The 
incorporation of external resources such as lexicons and 
sentiment dictionaries could enrich sentiment classification 
accuracy. Ensuring comprehensive user training and support 
resources would empower users to fully harness the tool's 
potential. In essence, the development of this tool represents 
just the initial phase of an ongoing journey towards refining 
and expanding its capabilities to effectively address the 
evolving demands of sentiment analysis and text annotation. 
With dedication and innovation, this tool has the capacity to 
reshape the extraction and analysis of sentiment-related 
insights from textual data, offering a transformative approach 
to this field. 
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